10 research outputs found

    Software comparison for clinical Named Entity Recognition (NER): A phase-1 study for developing a computer assisted medical claims billing and coding system

    Get PDF
    Claims billing and coding is non-trivial for health care providers. Accurate coding can help medical providers get reimbursements that they deserve for their professional services. Meanwhile, incorrect coding (e.g. up-coding) is considered by authorities to be one of the most important frauds with severe penalties. Therefore, accurate coding is of great importance to medical professionals. However, claims coding is challenging. Besides the knowledge of the E/M coding system, accurate coding requires an adequate depiction of patient health conditions and treatments, part of which are contained in unstructured clinical notes, e.g. discharge summaries and physician notes. We aim to develop a coding decision support system by leveraging state-of-the-art natural language processing (NLP) techniques and algorithms. The expected result of the project is to build an effective system that can extract essential information for claims coding from real clinical narratives. This phase-1 study compared five popular existing NLP software in named entity recognition based on 108 public available transcribed medical discharge summary notes from MTsamples.com. Qualitative comparison finds that CLAMP, Amazon Comprehend Medical, and cTAKES are more powerful. Quantitative analysis shows that CLAMP is more accurate and efficient than Amazon Comprehend Medical. Future work includes integrating a section segmentation tool before NER recognition as well as testing and implementation of the system in a clinical scenario.Health Systems InnovationComputer Scienc

    Collaborative Cloud Computing Framework for Health Data with Open Source Technologies

    Full text link
    The proliferation of sensor technologies and advancements in data collection methods have enabled the accumulation of very large amounts of data. Increasingly, these datasets are considered for scientific research. However, the design of the system architecture to achieve high performance in terms of parallelization, query processing time, aggregation of heterogeneous data types (e.g., time series, images, structured data, among others), and difficulty in reproducing scientific research remain a major challenge. This is specifically true for health sciences research, where the systems must be i) easy to use with the flexibility to manipulate data at the most granular level, ii) agnostic of programming language kernel, iii) scalable, and iv) compliant with the HIPAA privacy law. In this paper, we review the existing literature for such big data systems for scientific research in health sciences and identify the gaps of the current system landscape. We propose a novel architecture for software-hardware-data ecosystem using open source technologies such as Apache Hadoop, Kubernetes and JupyterHub in a distributed environment. We also evaluate the system using a large clinical data set of 69M patients.Comment: This paper is accepted in ACM-BCB 202

    Estimating a rural-urban PCP workload disparity: caring for smokers

    Get PDF
    Background: Smokers are concentrated in rural America. CDC reports 28.5% of rural Americans smoke versus 25.1% of urban Americans. The workload impact of those additional smokers in a rural primary care practice has not been investigated. We hypothesize that workload difference associated with caring for rural smokers will be greater than the 3.4% suggested by the smoking rate difference. We will calculate primary care physician workload differences based on number of rural versus urban smoker comorbidities. Defining physician workload by number of comorbidities being managed is novel. Given that payers are associating disease management metrics to payment, calculating primary care workload by comorbidities managed is salient and illuminates real-world primary care workload differences.Methods: We hold constant the number of patients in a typical primary care panel (2500) to estimate the volume of smokers in a rural practice (28.5% of 2500 = 712.5) and in an urban practice (25.1% of 2500 = 627.5). We use the Cerner Health Facts Data Base to determine rates of comorbidities among patients designated as smokers from 1/1/2010 to 9/18/2017 (n = 7,757,949; rural = 1,337,423, urban = 6,420,526). We estimate smoker-related comorbidities using the rates of rural and urban patients with 1, 2, 3 or 4+* comorbidities and multiply the rate by rural/urban smoker volume. For example, of the 712.5 patients in a rural practice, 14.73% of them have 3 comorbidities, resulting in 314.85 comorbidities (712.5 * .1473 * 3 = 314.85). We total all estimated number of comorbidities and compare rural and urban.Results: Using 2500 patients in a patient panel, we estimate that rural primary care physicians care for 85 more smokers than urban counterparts. Due to higher comorbidity rates of those smokers, it is estimated that rural primary care physicians manage 319.54 more comorbidities (2,367.07 rural smoker comorbidities, 2,047.53 urban smoker comorbidities), constituting a 15.6% comorbidity management workload increase associated with caring for smokers.Conclusions: The 3.5% rural-urban smoking rate difference falls short of telling the story of how smokers impact physician workload differently in rural and urban practices. We estimate that the smoker associated physician workload (comorbidity management) in a rural primary care practice is 16% greater than urban practice. This demonstrates a sizeable workload disparity between rural and urban primary care physicians. We encourage the review of other patient populations to better understand rural primary care workload inflation.*Patients with more than 4 comorbidities were aggregated to the 4 conditions. Even if they had more comorbidities only 4 were calculated per patient. Therefore, comorbidity rate differences may be greater or less than reported. Since rural patients are sicker, the assumption is that comorbidity management differences are likely underreported in this study

    An Ellipsoidal Bounding Scheme for the Quasi-Clique Number of a Graph

    No full text

    Pd(II)/Lewis Acid Catalyzed Intramolecular Annulation of Indolecarboxamides with Dioxygen through Dual C–H Activation

    No full text
    Transition-metal ion catalyzed intramolecular dual C–H activation to construct polycyclic heteroarene skeletons is merited for its step and atom-economic advantages in organic synthesis. However, in most cases, stoichiometric oxidants, elevated temperature, and other harsh conditions were commonly faced for this reaction, which apparently block the synthetic applications. Herein, we report a Pd(II)/LA (LA: Lewis acid) catalyzed intramolecular dual C–H activation to construct indoloquinolinone derivatives under mild conditions with dioxygen as the sole oxidant. It was found that adding LA such as Sc3+ to Pd(OAc)2 sharply improved its catalytic efficiency, whereas Pd(OAc)2 alone was very sluggish. The activity improvement was attributed to the linkage of the Sc3+ cation to the Pd(II) species through a diacetate bridge that significantly enhanced the electrophilic properties of Pd(II) for dual C–H activation

    Observing the Agostic Hydrogen in Pd(II)-Catalyzed Aromatic C–H Activation

    No full text
    Direct C–H activation and functionalization offer a convenient protocol for pharmaceutical and material syntheses. Although versatile mechanisms have been proposed to depict transition-metal-catalyzed C–H activation, to date, the shared key agostic hydrogen intermediate in several major mechanisms has not been observed yet, which apparently puzzles the mechanism-based catalyst design. This work reports the direct observations of this intermediate in Pd(II)/Sc(III)-catalyzed C–H activation of acetanilides, and its stability and reactivity in C–H activation are investigated. Remarkably, this intermediate is only observed in electron-rich acetanilides, and the meta-substituent with increased σm constant generally accelerates C–H activation, a characteristic of the base-assisted C–H activation mechanism. This study has unveiled the masks of this intermediate with an understanding of its first-hand physicochemical properties, shedding new light on mechanism-based catalyst design

    SNP variable selection by generalized graph domination.

    No full text
    BACKGROUND:High-throughput sequencing technology has revolutionized both medical and biological research by generating exceedingly large numbers of genetic variants. The resulting datasets share a number of common characteristics that might lead to poor generalization capacity. Concerns include noise accumulated due to the large number of predictors, sparse information regarding the p≫n problem, and overfitting and model mis-identification resulting from spurious collinearity. Additionally, complex correlation patterns are present among variables. As a consequence, reliable variable selection techniques play a pivotal role in predictive analysis, generalization capability, and robustness in clustering, as well as interpretability of the derived models. METHODS AND FINDINGS:K-dominating set, a parameterized graph-theoretic generalization model, was used to model SNP (single nucleotide polymorphism) data as a similarity network and searched for representative SNP variables. In particular, each SNP was represented as a vertex in the graph, (dis)similarity measures such as correlation coefficients or pairwise linkage disequilibrium were estimated to describe the relationship between each pair of SNPs; a pair of vertices are adjacent, i.e. joined by an edge, if the pairwise similarity measure exceeds a user-specified threshold. A minimum k-dominating set in the SNP graph was then made as the smallest subset such that every SNP that is excluded from the subset has at least k neighbors in the selected ones. The strength of k-dominating set selection in identifying independent variables, and in culling representative variables that are highly correlated with others, was demonstrated by a simulated dataset. The advantages of k-dominating set variable selection were also illustrated in two applications: pedigree reconstruction using SNP profiles of 1,372 Douglas-fir trees, and species delineation for 226 grasshopper mouse samples. A C++ source code that implements SNP-SELECT and uses Gurobi optimization solver for the k-dominating set variable selection is available (https://github.com/transgenomicsosu/SNP-SELECT)
    corecore